loss coefficient
Supplementary Material -- Towards Reliable Model Selection for Unsupervised Domain Adaptation: An Empirical Study and A Certified Baseline
We first prove the first inequality using Jensen's inequality, which states that for a real-valued, convex Next, we leverage the property of inequalities to prove the second inequality. However, this method has limited effectiveness in scenarios with severe domain shifts between the source and target domains. Directly taking source risk as target risk is unreliable due to domain distribution shifts between domains. This work was completed while Dapeng ( lhxxhb15@gmail.com) Subsequently, Reverse V alidation performs a reversed adaptation from the pseudo-labeled target to the source and utilizes the source risk in this reversed adaptation task for validation.
- Europe > Greece (0.04)
- Asia > Southeast Asia (0.04)
- Oceania > New Zealand (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Questionnaire & Opinion Survey (0.93)
- Personal > Obituary (0.45)
- Law (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Banking & Finance > Economy (1.00)
- (3 more...)
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
Wei, Tianwen, Zhu, Bo, Zhao, Liang, Cheng, Cheng, Li, Biye, Lü, Weiwei, Cheng, Peng, Zhang, Jianhao, Zhang, Xiaoyu, Zeng, Liang, Wang, Xiaokun, Ma, Yutuan, Hu, Rui, Yan, Shuicheng, Fang, Han, Zhou, Yahui
In this technical report, we introduce the training methodologies implemented in the development of Skywork-MoE, a high-performance mixture-of-experts (MoE) large language model (LLM) with 146 billion parameters and 16 experts. It is initialized from the pre-existing dense checkpoints of our Skywork-13B model. We explore the comparative effectiveness of upcycling versus training from scratch initializations. Our findings suggest that the choice between these two approaches should consider both the performance of the existing dense checkpoints and the MoE training budget. We highlight two innovative techniques: gating logit normalization, which improves expert diversification, and adaptive auxiliary loss coefficients, allowing for layer-specific adjustment of auxiliary loss coefficients. Our experimental results validate the effectiveness of these methods. Leveraging these techniques and insights, we trained our upcycled Skywork-MoE on a condensed subset of our SkyPile corpus. The evaluation results demonstrate that our model delivers strong performance across a wide range of benchmarks.
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States (0.04)
- Asia > Middle East > Jordan (0.04)
Explaining Veracity Predictions with Evidence Summarization: A Multi-Task Model Approach
Cekinel, Recep Firat, Karagoz, Pinar
The rapid dissemination of misinformation through social media increased the importance of automated fact-checking. Furthermore, studies on what deep neural models pay attention to when making predictions have increased in recent years. While significant progress has been made in this field, it has not yet reached a level of reasoning comparable to human reasoning. To address these gaps, we propose a multi-task explainable neural model for misinformation detection. Specifically, this work formulates an explanation generation process of the model's veracity prediction as a text summarization problem. Additionally, the performance of the proposed model is discussed on publicly available datasets and the findings are evaluated with related studies.
- Europe > United Kingdom (0.14)
- North America > United States > Montana (0.04)
- North America > United States > California (0.04)
- (2 more...)
- Media > News (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Government > Regional Government > North America Government > United States Government (0.93)
- Health & Medicine > Therapeutic Area > Neurology > Multiple Sclerosis (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Communications > Social Media (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Mitigating Negative Transfer in Multi-Task Learning with Exponential Moving Average Loss Weighting Strategies
Lakkapragada, Anish, Sleiman, Essam, Surabhi, Saimourya, Wall, Dennis P.
Multi-Task Learning (MTL) is a growing subject of interest in deep learning, due to its ability to train models more efficiently on multiple tasks compared to using a group of conventional single-task models. However, MTL can be impractical as certain tasks can dominate training and hurt performance in others, thus making some tasks perform better in a single-task model compared to a multi-task one. Such problems are broadly classified as negative transfer, and many prior approaches in the literature have been made to mitigate these issues. One such current approach to alleviate negative transfer is to weight each of the losses so that they are on the same scale. Whereas current loss balancing approaches rely on either optimization or complex numerical analysis, none directly scale the losses based on their observed magnitudes. We propose multiple techniques for loss balancing based on scaling by the exponential moving average and benchmark them against current best-performing methods on three established datasets. On these datasets, they achieve comparable, if not higher, performance compared to current best-performing methods.
- North America > United States > California > Yolo County > Davis (0.14)
- North America > United States > California > Santa Clara County > Stanford (0.05)
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
Building Intelligent Autonomous Navigation Agents
Breakthroughs in machine learning in the last decade have led to `digital intelligence', i.e. machine learning models capable of learning from vast amounts of labeled data to perform several digital tasks such as speech recognition, face recognition, machine translation and so on. The goal of this thesis is to make progress towards designing algorithms capable of `physical intelligence', i.e. building intelligent autonomous navigation agents capable of learning to perform complex navigation tasks in the physical world involving visual perception, natural language understanding, reasoning, planning, and sequential decision making. Despite several advances in classical navigation methods in the last few decades, current navigation agents struggle at long-term semantic navigation tasks. In the first part of the thesis, we discuss our work on short-term navigation using end-to-end reinforcement learning to tackle challenges such as obstacle avoidance, semantic perception, language grounding, and reasoning. In the second part, we present a new class of navigation methods based on modular learning and structured explicit map representations, which leverage the strengths of both classical and end-to-end learning methods, to tackle long-term navigation tasks. We show that these methods are able to effectively tackle challenges such as localization, mapping, long-term planning, exploration and learning semantic priors. These modular learning methods are capable of long-term spatial and semantic understanding and achieve state-of-the-art results on various navigation tasks.
- South America > Uruguay > Maldonado > Maldonado (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (7 more...)
- Research Report > New Finding (0.92)
- Instructional Material (0.92)
- Workflow (0.92)
- Leisure & Entertainment > Games > Computer Games (1.00)
- Education > Educational Setting > Online (0.67)
- Health & Medicine > Therapeutic Area (0.67)
- Government > Regional Government > North America Government > United States Government (0.45)
Miej/Dynamic_Neural_Manifold
In this project, I've built a neural network architecture with a static execution graph that acts as a dynamic neural network in which connections between various neurons are controlled by the network itself. This is accomplished by manipulating the adjacency matrix representation of the network on a per-neuron basis with cell elements representing a'distance', and masking off connections that are within a threshold. Including a loss term based on the networks sparsity or processing time allows the architecture to optimize its structure for accuracy or speed. Alright, so hopefully I've caught your attention with the title. To begin, I'd like to explain a little behind why I've created this. My educational background is actually in the sciences, just at the junction between chemistry and physics.